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Abstract. We show that any finite monoid or semigroup presentation 
satisfying the small overlap condition C(4) has word problem which is a 
deterministic rational relation. It follows that the set of lexicographically 
minimal words forms a regular language of normal forms, and that these 
normal forms can be computed in linear time. We also deduce that 
C(4) monoids and semigroups are rational (in the sense of Sakarovitch) , 
asynchronous automatic, and word hyperbolic (in the sense of Duncan 
and Oilman). From this it follows that C(4) monoids satisfy analogues of 
Kleene's theorem, and admit decision algorithms for the rational subset 
and finitely generated submonoid membership problems. We also prove 
some automata-theoretic results which may be of independent interest. 

1. Introduction 

Small overlap conditions are natural combinatorial conditions on monoid 
and semigroup presentations, which serve to limit the complexity of deriva- 
tion sequences between equivalent words. They are the natural semigroup- 
theoretic analogues of the small cancellation conditions extensively em- 
ployed in combinatorial and geometric group theory [15J. It has long been 
known that monoids with presentations satisfying the condition C(3) have 
decidable word problem [U \TT\ [18] ; recent research of the author [13] has 
shown that the slightly stronger condition C (4) implies that the word prob- 
lem is solvable in linear time on a 2-tape Turing machine. 

In this paper, we take an automatic-theoretic approach to the study of 
small overlap semigroups and monoids. Our main result is that the word 
problem for any C(4) monoid or semigroup presentation is a deterministic 
rational relation (and moreover, effectively computable as such). It follows 
from results of automata theory [T2] that the set of all words which 
are lexicographically minimal in their equivalence classes forms a regular 
language of normal forms, and that a normal form for any element can be 
computed in linear time. We are also able to deduce that every monoid or 
semigroup admitting a presentation satisfying the condition C(4) is rational 
(in the sense of Sakarovitch [19j ) and hence also asynchronous automatic, 
and word hyperbolic (in the sense of Duncan and Gilman [3]). Another 
consequence is that C(4) monoids satisfy an analogue of Kleene's theorem 
(see for example [10] ) : their rational subsets coincide with their recognisable 
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subsets. It follows also that membership is uniformly decidable for rational 
subsets, and hence also for finitely generated submonoids, of such monoids. 

In addition to this introduction, this article comprises four sections. Sec- 
tion [2] briefly reviews the definitions of monoid and semigroup presentations, 
and of small overlap conditions. Section [3] contains some purely automata- 
theoretic results which will be used to establish our main results, and may 
be of some independent interest. In Section Owe combine the results of the 
previous section with those of [13] to prove our main theorem. Finally, in 
Section [5] we deduce some consequences. 

2. Preliminaries 

In this section we briefly recall the key definitions of semigroup and 
monoid presentations and of small overlap conditions, which will be used 
in the rest of this paper. 

Let A be a finite alphabet (set of symbols). A word over A is a finite 
sequence of zero or more elements from A. The set of all words over A is 
denoted A*; under the operation of concatenation it forms a monoid, called 
the free monoid on A. The length of a word w £ A* is denoted \w\. The 
unique empty word of length is denoted e; it forms the identity element of 
the monoid A*. The set A* \ {e} of non-empty words forms a subsemigroup 
of A*, called the free semigroup on A and denoted A + . For k E N we write 
A k , A- k and A <k to denote the set of words in A* of length respectively 
exactly k, less than or equal to k, and strictly less than k. If w E A* is a 
word, we write w to denote the reverse of w, that is, the word composed 
of the letters of w written in reverse order. 

A finite monoid presentation (A \ R) consists of a finite alphabet A (the 
letters of which are called generators), together with a finite set R C A* x A* 
of pairs of words (called relations). We say that u, v € A* are one-step equiv- 
alent if u = axb and v = ayb for some possibly empty words a, b € A* and 
relation (x, y) € R or (y, x) € R. We say that u and v are equivalent, and 
write u =r v or just u = v, if there is a finite sequence of words beginning 
with u and ending with v, each term of which but the last is one-step equiv- 
alent to its successor. Equivalence is clearly an equivalence relation; in fact 
it is the least equivalence relation containing R and compatible with the 
multiplication in A* . We write u for the equivalence class of a word u € A* . 
The equivalence classes form a monoid with multiplication well-defined by 
uv = uv; this is called the monoid presented by the presentation. 

The word problem for a (fixed) monoid presentation (A \ R) is the algo- 
rithmic problem of, given as input two words u,v £ A*, deciding whether 
u = R v. 

Definitions corresponding to all of those above can also be made for semi- 
groups (without necessarily an identity element), by taking A + in place of 
A* (in all places except the definition of one-step equivalence, where a and 
b must still be allowed to be empty). 

Now suppose we have a fixed monoid or semigroup presentation (A \ R). 
We begin by recalling some basic definitions from the theory of small overlap 
conditions [HJ [T7] . A relation word is a word which appears as one side of a 
relation in R. A piece is a word which appears more than once as a factor 
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in the relations, either as a factor of two different relation words, or as a 
factor of the same relation word in two different (but possibly overlapping) 
places. Let m £ N be a positive integer. The presentation is said to satisfy 
C{m) if no relation word can be written as a product of strictly fewer than 
m pieces. Thus C(l) says that no relation word is empty (which in the 
semigroup case is a trivial requirement); C(2) says that no relation word is 
a factor of another. 

Retaining our fixed presentation, we now recall some more specialist ter- 
minology from [13] . For each relation word R, let Xr and Zr denote respec- 
tively the longest prefix of R which is a piece, and the longest suffix of R 
which is a piece. If the presentation satisfies C(3) then R cannot be written 
as a product of two pieces, so this prefix and suffix cannot meet; thus, R ad- 
mits a factorisation XrYrZr for some non-empty word Yr. If moreover the 
presentation satisfies the stronger condition C(4) then R cannot be written 
as a product of three pieces, so Yr is not a piece. The converse also holds: a 
C(3) presentation such that no Yr is a piece is a C(4) presentation. We call 
Xr, Yr and Zr the maximal piece prefix, the middle word and the maximal 
piece suffix respectively of R. 

If R is a relation word we write R for the (necessarily unique, as a result of 
the small overlap condition) word such that (R, R) or (R, R) is a relation in 
the presentation. We write Xr, Yr and Zr for X^, Y^ and Z-g respectively. 
(This is an abuse of notation since, for example, the word Xr may be a 
maximal piece prefix of two distinct relation words, but we shall be careful 
to ensure that the meaning is clear from the context.) 

A relation prefix of a word is a prefix which admits a (necessarily unique, 
as a consequence of the small overlap condition) factorisation of the form 
aXY where X and Y are the maximal piece prefix and middle word re- 
spectively of some relation word XYZ. An overlap prefix (of length n) of 
a word u is a relation prefix which admits an (again necessarily unique) 
factorisation of the form WTiY/A^Y^ • • • X n Y n where 

• n > 1; 

• bXiY^X^Y^ . . . X n Y n has no factor of the form XqYq, where Xq and 
Yq are the maximal piece prefix and middle word respectively of some 
relation word, beginning before the end of the prefix b; 

• for each 1 < i < n, R4 = XiYiZi is a relation word with Aj and Z\ 
the maximal piece prefix and suffix respectively; and 

• for each 1 < i < n, Y- is a proper, non-empty prefix of Yj- 

Let u £ A* be a word and let p be a piece. We say that u is p-active if 
pu has a relation prefix aXY with \a\ < \p\, and p-inactive otherwise. 

We now recall some basic definitions from automata theory. If A is an 
alphabet, we denote by A^ the alphabet A U {$} where $ is a new symbol 
not in A. The symbol $ will be used as an end-marker for certain types of 
automata. If R C A\ x A\ is a relation, we denote by R® the set 

R $ = R($,$) = {(u$,v$) I (u,v) G R} C Al$xA* 2 $ C (A?)* x (A$)*. 

A rational transducer from an alphabet A\ to an alphabet A2 is a finite 
directed graph with edges labelled by elements of A* X A|, together with a 
distinguished initial vertex and a set of distinguished terminal vertices. The 
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labelling of edges extends to a labelling of paths via the multiplication in 
the direct product monoid A\ x A\. A pair (u,v) G A\ x A\ is accepted by 
the transducer if it labels some path from the initial vertex to a terminal 
vertex. The relation accepted by the transducer is the set of all pairs ac- 
cepted. A relation accepted by some transducer is called a rational relation 
or rational transduction. Transductions, which were introduced in [3], are of 
fundamental importance in the theory of formal languages and automata; a 
detailed study can be found in pp. 

A deterministic 2-tape finite automaton consists of two alphabets A± and 
A2, a finite state set Q partitioned into two disjoint subsets Q\ and Q2 with 
a distinguished initial state and set of distinguished terminal states, and for 
each i = 1, 2 a partial function 

Si : Qi x Af Q. 

Let 1— > be the smallest binary relation on A\$ x A%$ x Q such that 

• (au,v,p) 1— > (u,v,q) for all a G Ai, u G i> € AgS, p € Qi, q € Q 
such that <5i(p, a) is defined and equal to q; and 

• (u, bv,p) 1— > (u, v, (/) for all b G A2, w G A^$, v G A^S, p G Q2, q £ Q 
such that ^(p, 6) is defined and equal to q; 

and let 1— ►* be the reflexive, transitive closure of 1— ►. We say that a pair 
(u, v) G A\ X is accepted by the automaton if there exists an initial 
state qo and a terminal state q\ such that that (u$,u$,go) 1— >* (e,e, gi). 
Once again, the relation accepted by the automaton is the set of all pairs 
accepted. 

A relation is called a deterministic rational relation if it is accepted by a 
deterministic 2-tape automaton, and a reverse deterministic rational relation 
if the relation 

{(u R ,v R ) I (u,v) eR} 
is accepted by a deterministic 2-tape automaton. In general, a deterministic 
rational relation need not be reverse deterministic rational [51 Theorem 1]. 
Every [reverse] deterministic rational relation is accepted by a transducer [5] 
and so is indeed a rational relation. The following elementary proposition 
gives a partial converse to this statement; the general idea is well known 
but the precise formulation we need does not seem to have appeared in the 
literature, so for completeness we give an outline proof. 

Proposition 1. Let R C A\ x A\ be a relation and suppose R® is accepted 
by a transducer with the property that for every state q, one of the following 
(mutually exclusive) conditions holds: 

(i) q has an edge leaving it, and every edge leaving q has the form (a, e) 
for some a G Af, and there is at most one such edge for each a G A® ; 

(ii) q has an edge leaving it, and every edge leaving q has the form (e, a) 
for some a G A®, and there is at most one such edge for each a 6 i'l 

(iii) there are no edges leaving q; 

(iv) there is exactly one edge leaving q, and that edge has label (e,e); 
Then R is accepted by a deterministic 2-tape automaton. 

Proof. Let M be the transducer accepting Rr with the given property, and 
let Q be the state set of M. Notice that for each state q, there is at most 
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one state, which we call q, with the property that there is a path from q 
to q labelled (e, e) and q satisfies condition (i) or (ii) in the statement of 
the proposition. Since (i) and (ii) are mutually exclusive, we may choose a 
partition Q = Q\ U Q2 of Q into disjoint subsets such that for every q G Q 
with q defined we have that q satisfies condition (i) if and only if q G Qi, 
and similarly q satisfies condition (ii) if and only if q G Q 2 - (States q for 
which q is not defined may be assigned arbitrarily to either Q\ or Q2). 

We now define a new deterministic 2-tape automaton N as follows. The 
two tape alphabets of N are Ai and A 2 . The state set of N is the state set Q 
of M partitioned into the subsets Q\ and Q2 constructed above. The initial 
state of N is the initial state of M. The terminal states of N consist of all 
states p G Q such that M has a path from p to a terminal state with label 
(e, e). For each a G Af , p £ Qi and q G Q we set 5i(p, a) = q if and only if 
p is defined and M has an edge from p to q with label (a, e). Similarly, for 
each a € A|, p G Q2 and g G Q we set ^(p, a) = g if and only if p is defined 
and M has an edge from p to q with label (e, a). It follows directly from the 
criteria on the automata that each 5i is a well-defined partial function from 
Qi x Af to Q. 

It is now a routine matter to verify that the deterministic 2-tape automa- 
ton N accepts a pair (u, v) if and only if M accepts □ 

3. Prefix-Rewriting Automata 

In this section, we study a type of automaton called a 2-tape prefix- 
rewriting automaton. We show that any relation accepted by a [determinis- 
tic] 2-tape prefix-rewriting automaton with a certain property called bounded 
expansion is a [deterministic] rational relation. In Section H] we shall apply 
this result to show that the word problem for a C(4) monoid presentation 
is a deterministic rational relation. 

Let k £ N and Ai and A2 be finite alphabets. A k-prefix-rewriting au- 
tomaton from A\ to A2 is a finite directed graph with edges labelled by 
elements of 

((Ap x Ap) U (A< fc $ x Ap$)) x ((Ap x Ap) U (A< fc $ x A<*$)) , 

together with a distinguished initial vertex and a set of distinguished termi- 
nal vertices. Given such an automaton with vertex set Q, we define a binary 
relation -> on AJ$ x A?j$ x Q by 

(«i$,Wi$, ) -> (li2$,U 2 $,g2) 

if and only if there exist words x±, X2, yi, V2, u' and v' in the appropriate 
alphabets such that 

Ml = Xiu', U2 = X2U', V\ = yW, V2 = V2v' 

and (xi, X2, yi, 2/2) labels an edge from c/i to 92- If this holds we say that the 
edge e is applicable in the configuration (ui$, vi$, qi). We call the automaton 
deterministic if in each configuration (u,v,q) G AJ$ x A^$ x Q there is at 
most one edge applicable. 

Let -^>* denote the reflexive, transitive closure of the relation — ►. We say 
that a pair (u, v) G A\ x A\ is accepted by the automaton if there exists a 
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terminal state q± such that 

(u$,v$,q ) ^* ($,$,9i) 

where qo is the initial state. As usual, the relation accepted by the automaton 
is the set of all pairs in A* x A\ which are accepted by the automaton. 

Intuitively, a 2-tape prefix-rewriting automaton is very similar to a 2- 
pushdown automaton; the only essential difference is that we allow both 
stacks to be initialised with non-empty words, and view the automaton 
as accepting pairs of words and defining a relation instead of a language. 
As one might expect, such automata are extremely powerful, being easily 
seen to accept in particular any relation of the form L x {e} where L is a 
recursively enumerable language. However, we shall be interested in a more 
restricted class of such automata. We say that a prefix-rewriting automaton 
has bounded expansion if there exists a constant b € N such that whenever 

(■ui,vi,gi) —>* (u 2 ,v 2 ,q2) 

we have \u%\ < \ui\ + b and \v 2 \ < |f2 1 + b. We call such a value of b an 
expansion bound for the automaton. 

Note that the bounded expansion condition places a requirement on the 
contents of each store independently. This contrasts with the shrinking and 
length-reducing conditions on 2-pushdown automata, used to describe grow- 
ing context-sensitive and Church- Rosser languages [2], where a restriction 
is applied to the total size of the 2 stores considered together. It transpires 
that our condition is a very strong one, in that a relation accepted by a 
prefix-rewriting automaton with bounded expansion is necessarily rational. 

Theorem 1. Any relation accepted by a [deterministic] 2-tape prefix-rewriting 
automaton with bounded expansion is a [deterministic] rational transduction. 
Moreover, given a [deterministic] 2-tape prefix-rewriting automaton and an 
expansion bound for it, one can effectively construct a [deterministic] trans- 
ducer recognising the same relation. 

Proof. Let M be a 2-tape fc-prefix-rewriting automaton with bounded ex- 
pansion accepting a relation R C A\ x A\, and let b £ N be an expansion 
bound for M. We construct from M a finite transducer N which simulates 
M and so accepts R®. Intuitively, the new transducer will read u and v, 
buffering at least the first k characters of each in the finite state control. 
Prefix-modification can thus be simulated by modifying only the contents 
of the finite state control. Since a prefix-rewriting automaton can replace a 
prefix with a longer one, it may be necessary to store more than k characters 
of each word in the finite state control, but the expansion bound serves to 
ensure that a buffer of some fixed size (namely k + b) will always suffice. 

Formally, for i = 1, 2 we let d = Af k+b U A< k+b $ and let B { be the set 
of all words x S Cj such that either \x\ > /c or the final letter of x is $. 
(Intuitively, C{ will be the set of all possible states for the buffer on tape i, 
while Bi will be the set of "adequately populated" buffer states in which it 
is not immediately necessary to read any more of the input word.) 

We construct a transducer N as follows. The state set of N is C\ x C 2 x Q 
where Q is the state set of M. The initial state is (e, e, qo) where qo is the 
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initial state of M. The terminal states are those of the form ($, $, q) with q 
a terminal state of M. The edges are as follows: 

(1) for every x G Ci, y G C2 with x £ B±, every a G such that 
xa G Ci and every state q, there is an edge from (x, y, q) to (xa, y, q) 
with label (a, e); 

(2) for every x G Ci, y G C2 with x E B\ but y £ B2, every a £ 
such that ya G C2 and every state (7, there is an edge from (x, y, q) 
to (x,ya,q) with label (e,a); 

(3) for each edge in M from p to q with label («i, U2, v 1, t^) and each 
x',y' such that nix' G B\ and G i?2 ; there is an edge from 
{u\x\ viy',p) to (u2x' , V2y' , q) with label (e,e) provided ii2x' G C\ 
and V211' G C2. 

Edges of types (1) and (2) serve simply to read the input words into the 
buffers until each contains sufficient data (at least k letters or the entire of 
the input if this is less), while edges of type (3) simulate the transitions of 
the prefix-rewriting automaton M by operating only on the buffers. 

Notice that once the transducer reaches a state in Af k+b $ x C2 x Q (that 
is, one where the first buffer content contains the symbol $), it will always 
remain in such a state, and will never again read from the first input word. 
Similarly, once it reaches a state in C\ x A 2 k+h % x Q it will always remain in 
such a state and will never again read from the second input word. Noting 
also that all the terminal states lie in both of these sets, it follows that all 
pairs accepted by the transducer lie in x ^2$. 

We say that a configuration (u±,vi,qi) has expansion bound (c,d) G NxN 
if whenever (u±,vi,qi) ^* (^2,^2, Q2) we have \u2\ < + c and \v2\ < 
I ni| + d. Note that the expansion bound condition on the automaton means 
that (6, b) is an expansion bound for every configuration. We shall need the 
following lemma. 

Lemma 1. Suppose (ui,v±,qi) ^* {u2,V2,q2) in the prefix-rewriting au- 
tomaton M. Suppose further than {u\,v\,q{) has expansion bound (ci,d±) 
and that u\ = sis' 1? v\ = tit^ where \s\\ < k + b — c\ and \t\\ < k + b — d±. 
Then there exist factorisations U2 = S2s' 2 and V2 = ^2 an< ^ an expansion 
bound (02, c^) for (u2,V2, (72) such that \s2\ < k + b — C2, | * 2 1 <k + b — d2 and 
the transducer N has a path from to (s2,i2,(?2) with label (g,h) 

where s[ = gs' 2 and t[ = ht' 2 . 

Proof. We use induction on the number of steps in the transition sequence 
from from (ui,v±,qi) to (1x2,^2, ^2)- Certainly if (ui,vi,qi) = (^2,^2,^2) it 
suffices to take S2 = si, s' 2 = s[, t2 = t±, t' 2 = t[, C2 = c±, c/2 = d\ and 
g = h = e. 

Next we consider one-step case, that is, the case in which (ui,vi,q±) — > 
(u2,V2,q2)- Let g be the shortest prefix of s[ such that s±g G B±; similarly, 
let h be the shortest prefix of t[ such that t±h G B2. It follows easily from the 
definition that our transducer N has a path from (s\,t\,q{) to (sig,tih,qi) 
with label (g, h). 

Now since (u±,vi,qi) — > (U2, V2, 92), by definition there exist words x\, 
%2, yi, 2/2, u' and v' such that u\ = x\v! U2 = X2U', v\ = y\v', V2 = y2v' 
and (xi, X2, 2/1, 1/2) labels an edge from q\ to q2- Since |xi|, \y±\ < k we have 
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that x\ and y± are prefixes of s±g and t\h respectively, say s\g = x\x' and 
t\h = y\y' . But now by the definition of our transducer, there is an edge 
from (sig = x\x\t\h = yiy',qi) to (x2x', 2/22A Q2) with label (e,e). Thus, 
setting S2 = X2X' and £2 = 2/22/' and defining s 2 and t' 2 accordingly, we obtain 
a path from (si,ii,gi) to (52,^2^2) with label (g,h). 
Now we have 

X2x's' 2 = S2S2 = U2 = X2U' 

so cancelling on the left we obtain v! = x's 2 . But now 

sis[ = u\ = x\u' = x\x' s' 2 = s\gs' 2 

so cancelling again yields = gs' 2 as claimed. An entirely similar argument 
shows that t[ = ht 2 . 

Next, notice that we have — 1 7/2 1 = \si\ — | S2 1 and similarly — \v2\ = 
— I S2 1 - Set C2 = c\ + \s\\ — \s2\ and d,2 = d\ + \t\\ — | ^2 1 - Clearly since 
any state derivable from (^2,^2,^2) is also derivable from it is 

readily verified that (02,^2) is an expansion bound for (^2,^2,92)- But now 
we have 

M = |si| + ci-c 2 < (k + b - ci) + c\ - c 2 = k + b-c 2 

and similarly \t%\ < k + b — d,2 as required to complete the proof of the lemma 
in the one-step case. 

The inductive argument for the general case is now straightforward. □ 

Now if (n, v) is accepted by the prefix-rewriting automaton then by defi- 
nition we have (u$, v$, qo) — ($, $, qt) where qo is the initial state and qt is 
some terminal state. Since the automaton has expansion bound b, the state 
(u$,v$,qo) has expansion bound (b,b). So taking u\ = u, v\ = v, q\ = qo, 
Q2 = qt c\ = d\ = b, s± = t\ = e, s[ = u and s 2 = v and applying Lemma [Q 
our transducer has a path from (e, e, qo) to (S2, qt) with label (g, h) where 
S2S 2 = t2t 2 = $, u = s[ = gs 2 and v = t[ = ht 2 . 

Now either S2 = e and s 2 = $, or S2 = $ and s 2 = e. In the latter case we 
have g = u$. In the former case we have g = u and there is clearly an edge 
from (s2,t2,qt) to (^2$ = $,t2,qt) labelled ($,e), so in either case there is a 
path from (e, e, go) to ($,t2,qt) with label (n$,/i). A similar argument deals 
with the case that h = v , showing that in all cases there is a path from the 
start state (e,e, qo) to the terminal state ($,$,%) with label (w$,t>$). Thus, 
the transducer iV accepts (u$, v$) as required. 

Conversely, suppose (u$, v$) is accepted by our transducer. Then there 
must be a path tt from (e, e,qo) to ($,$,<&) for some initial state qo and 
terminal state qt- Now clearly tt admits a unique decomposition of the form 

TT = A0P1A1P2 • • • Priori 

where each pi is a single edge of type (3) and each Aj is a (possibly empty) 
path consisting entirely of edges of types (1) and (2). Clearly each pi has 
label (e, e). Suppose each Aj has label (uj, Vi); then clearly u$ = uoU\ . . . u n 
and v% = voV\...v n . Suppose that for < i < n, after traversing the 
initial segment of the path ir up to and including Aj, the automaton is in 
configuration (xi,yi,q,j). Notice that, since the paths A do not change the 
state component, go is consistent with its use above, and in particular is an 
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initial state in the prefix-rewriting automaton M. Similarly, q n = qt is a 
terminal state of M. Now for < i < n define 

Ci = XiU i+ iu i+ 2 ...ti„ and di = yiV i+ iv i+ 2 . ■ ■ v n . 

Clearly we have that xq = uq and yo = vq, from which it follows that c$ = u% 
and do = v$. We also have x n = y n = $ so that c n = d n = $. 
Now it is straightforward to see that for 1 < i < n we have 

(ci-i)di-i,9i-i) -> (ci,di,qi) 

so that 

(«$, u$, g ) = (co, d , go) ^* (cn, ?„) = ($, $, 9t)- 
which by definition means that (u, v) is accepted by the 2-tape prefix- 
rewriting automaton M. This completes the proof that the transducer N 
accepts the relation /? . It is easy to show that for any relation T, T is a 
rational relation if and only if is a rational relation, so this suffices to 
prove that R is a rational relation. 

Finally, suppose that the original prefix-rewriting automaton M is deter- 
ministic. We claim that the transducer N which we have constructed to 
accept i£* satisfies the conditions of Proposition Q3 from which it will follow 
that R is a deterministic rational relation, as required. 

To this end, consider a state (x, y, q) in N. If x ^ -Bi then it follows 
immediately from the definition that all out-edges have labels of the form 
(a, e) with a £ A\ and that there is exactly one such for each a € A, so that 
condition (i) holds. Similarly, if x S B\ but y £ B2 then all out-edges have 
labels of the form (e, a) and there is exactly one such for each a € A2 so 
condition (ii) holds. 

Finally, suppose x € B\ and y € B2. From the definition of N, any edge 
leaving (x,y,p) must have label (e, e). If there were more than one such 
edge, then each would correspond to a different possible transition in M 
from the state (x, y,p); but by the determinism assumption on M there can 
only be one such transition, so this would give a contradiction. Thus we 
deduce that there is at most one such edge, so that either condition (iii) or 
condition (iv) holds. This completes the proof. □ 

We emphasise that Theorem [T] does not give a means to effectively con- 
struct a transducer for a relation R starting only from a 2-tape prefix- 
rewriting automaton with bounded expansion which accept R. The con- 
struction in the proof makes explicit use of the expansion bound for the 
prefix-rewriting automaton, and it is not clear that one can effectively com- 
pute an expansion bound from the automaton, even given the knowledge 
that such a bound exists. 

4. Automata for the Word Problem in Small Overlap Monoids 

The aim of this section is to show that the word problem for any C(4) 
monoid must be a deterministic rational relation. Throughout this section, 
we fix a monoid presentation (A \ R) satisfying the condition C(4). 

In [13] we presented an efficient recursive algorithm which can be used 
to solve the word problem for such a presentation. For ease of reference 
the algorithm is reproduced in Figure 1. It takes as input a piece of the 
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WP-PKEFlx(u,V,p) 



1 if u = e or v = e 

2 then if u = e and v = e and p = e 

3 then return Yes 

4 else return No 

5 elseif u does not have the form XYv! with XY a clean overlap prefix 

6 then if u and v begin with different letters 

7 then return No 

8 elseif p ^ e and u and p begin with different letters 

9 then return No 
fO else 

f 1 u <— u with first letter deleted 

12 v <— v with first letter deleted 

13 if p + e 

14 then p <— p with first letter deleted 

15 return WP-Prefix(u,i;,p) 

16 else 

17 let X, Y, v! be such that u = XYv! 

18 if p is a prefix of neither X nor X 

19 then return No 

20 elseif v does not begin either with XY or with XY 

21 then return No 

22 elseif u = XYZu" and v = XYZv" 

23 then if u" is Z-active 

24 then return WP-Prefix(Zu", Zv", e) 

25 else return WP-Prefix(Z«", Zu", e) 

26 elseif « = JYti' and t> = XYv' 

27 then if p is a prefix of X 

28 then return WP-Prefix(V , v', e) 

29 else return WP-Prefix(V, v', Z) 

30 elseif u = XYZu" and u = XYZv" 

31 then if u" is Z-active 

32 then return WP-Prefix(Zu", Zv", e) 

33 else return WP-Prefix(Zu", Zv", e) 

34 elseif u = XYv! and v = XFZw" 

35 then return WP-Prefix« Zv" , e) 

36 elseif u = XYZu" and v = XYv' 

37 then return WP-Preflx(ZV, v', e) 

38 elseif u = XYu' and v = XYv' 

39 then let z be the maximal common suffix of Z and Z 

40 let zi be such that Z = z\z 

41 let Z2 be such that Z = z^z 

42 if does not begin with z\ or v' does not begin with 

43 then return NO 

44 else let u" be such that u' := z\u" 

45 let v" be such that v' := Z2f"; 

46 return WP-Prefix(/u", v" , z) 



Figure 1 . Algorithm for the word problem of a C (4) presentation 
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presentation p G A* and two words u, v G A* and outputs YES if u = v 
and p is a possible prefix of u (and hence also of v). Otherwise it outputs 
NO. In particular, if p = e then the algorithm outputs YES if u = v and 
NO if u ^ v, thus solving the word problem for the presentation. See 
[131 Lemma 5] and [131 Lemma 6] for proofs of correctness and termination 
respectively. 

The proof strategy for our main result is to show that this algorithm can 
be implemented on a deterministic 2-tape prefix-rewriting automaton with 
bounded expansion. The results of Section [3] then allow us to conclude that 
the word problem is a deterministic rational relation. 

Theorem 2. Let (A \ R) be a finite monoid presentation satisfying the small 
overlap condition C(4). Then the relation 

{(u,v) G A* x A* \ u = v} 

is deterministic rational and reverse deterministic rational. Moreover, one 
can, starting from the presentation, effectively compute 2-tape deterministic 
automata recognising this relation and its reverse. 

Proof. Let k be twice the maximum length of a relation word in the pre- 
sentation. We construct a deterministic 2-tape /c-prefix-rewriting automaton 
recognising the desired relation, and an expansion bound for this automaton. 
By Theorem [TJ this suffices to show that the given relation is deterministic 
rational and that a 2-tape deterministic automaton for it can be effectively 
constructed. Since the C(4) condition on the presentation is entirely left- 
right symmetric, the claim regarding the reverse relation also follows. 

Let P be the set of all pieces of the presentation (^4 | R), and let + 
be a new symbol not in P. Recall that e is by definition a piece of every 
presentation, so certainly e G P. Let W = A k U A <k %. We define a 2-tape 
prefix-rewriting automaton with 

• state set P U {+}; 

• initial state e, 

• unique terminal state +; 

and edges defined as follows. 

(A) an edge from e to + labelled ($, $, $, $). 

(B) for every u G W with u ^ $ and such that u has no clean over- 
lap prefix of the form XY, and every v G W such that d / $ and 
u and v begin with the same letter, a transition from p to p' la- 
belled (u, u',v, v') where u' ', v' and p' are obtained from u, v and p 
respectively by deleting the first letter. 

In addition for every p€P and u,v G W such that u has a clean overlap 
prefix (say XY) and p is a prefix of either X or X or both, the automaton 
may have an edge from p to another state in P as follows: 

(CI) If u = XYZu", v = XYZv" and u" is Z-active, the automaton has 

an edge from p to e labelled (u, Zu" , v, Zv"). 
(C2) If u = XY Zu" , v = XYZv" and u" is not Z-active, the automaton 

has an edge from p to e labelled (u, Zu", v, Zv"). 
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(C3) If u = XYu', v = XYv' , u and v do not both have XYZ as a prefix, 
and p is a prefix of X, the automaton has an edge from p to e labelled 
(it, it', v, v'). 

(C4) If u = XYu' , v = XYv' , it and v do not both have XYZ as a prefix, 

and p is not a prefix of X, the automaton has an edge from p to Z 

with label (u,u' ,v,v'). 
(C5) If u = ZFZm", f = XYZv" and it" is Z-active, the automaton has 

an edge from p to e labelled (74, Zu" , v, Zv"). 
(C6) If u = XY Zu" , v = X7Zt;" and it" is not Z-active, the automaton 

has an edge from p to e labelled (it, Zit", y, Zv"). 
(C7) If u = XYu', v = XY Zv" and it does not have XY Z as a prefix, 

the automaton has an edge from p to e labelled (it, it', v, Zv"). 
(C8) If it = XYZu", v = XYu' and d does not have XYZ as a prefix, 

the automaton has an edge from p to e labelled (it, Zit", u, v'). 
(C9) If it = XYu', v = XYv' , u does not begin with XYZ, u does not 

begin with XYZ, z is the maximum common suffix of Z and Z, 

Z = z\z, Z = Z2Z, u' = z\u" , v' = Z2v" , the automaton has an edge 

from p to z labelled (it, it", v, v"). 

First, notice that this automaton is deterministic. Indeed, all edges leaving 
a given vertex p £ P have labels of the form (u,x,v,y) with it, v G W. 
Notice that no member of the set W is a prefix of another; it follows that no 
word has two distinct words in W as prefixes, which means that the choice 
of prefixes it and v to act on is uniquely determined by the configuation in 
which the action is to be applied. Now it can be verified by examination 
that the various conditions on u, v and p which result in the inclusion of an 
edge from p with label of the form (it, x, v, y) are mutually exclusive, so that 
there is at most one such edge, and hence at most one transition applicable 
in any given configuration. 

It is now an entirely routine matter to prove by induction that for every 
piece p G A* and words it, v E A* we have 

(u$,v$,p) ^* ($,$,+) 

if and only if the algorithm outputs YES, that is, if and only if it = v and p 
is a possible prefix of it. Transitions of types B, CI, C2, C3, C4, C5, C6, C7, 
C8 and C9 correspond to the recursive calls at lines 15, 24, 25, 28, 29, 32, 33, 
35, 37, 46 respectively, while transition of type A corresponds to termination 
with the answer YES at line 3 of the algorithm. The conditions under which 
the algorithm terminates with the answer NO (at lines 4, 7, 9, 19, 21 and 
43) all correspond to non-terminal configurations of the automaton in which 
no transitions are applicable. It follows from |13[ Lemma 7] that the tests for 
clean overlap prefixes and Z-activity on the buffer contents are equivalent 
to performing the corresponding tests on the whole of the remaining input, 
as demanded by the algorithm. 
In particular, we have 

(u$,v$,e) ^* ($,$,+) 

if and only if it = v, as required to show that our prefix-rewriting automaton 
solves the word problem. It remains only to find an expansion bound for 
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the automaton. Let b be the length of the longest relation word in the 
presentation (A | R). 

Suppose (uQ,vo,qo) — >* (ui,vi,qi) and suppose that no = zou' and vq = 
zqv'q where zq is either a proper suffix of a relation word or the empty word. 
We claim that there are factorisations u\ = z\u\ and v\ = z\v' x where z\ is a 
proper suffix of relation word or the empty word, \u[\ < \u' \ and \v[\ < \v' \. 

We consider first the one-step case, that is, where (no, no, qo) — > (u±,vi, q±). 
If the transition from (no,no,<?o) to (u\,v\,qi) is of type A or B then the 
claim is clear, so suppose the transition is of type C1-C9. Then from the 
definitions of these transitions, we must have no = XYv! for some maxi- 
mum piece prefix X and middle word Y of a relation word XYZ. Now XY 
cannot be a piece, so it cannot be a prefix of zq, which is a proper suffix 
of a relation word. Thus, we must have \XY\ > \zq\ and hence |n'| < |n |. 
Looking again at the definitions of the transitions, we see that u\ and v\ 
either 

(i) are (not necessarily proper) suffixes of v! and v' respectively; or 

(ii) have the form u\ = Zu" and v\ = Zv" where u" and v" are (not 
necessarily proper) suffixes of u' and v' respectively; or 

(iii) have the form u\ = Zu" and v\ = Zv" where u" and v" are (not 
necessarily proper) suffixes of u' and v' respectively. 

In case (i) it suffices to set Z\ = e and u[ = u\. In case (ii) [respectively, 
case (iii)] it suffices to set z\ = Z [respectively, z\ = Z] and u[ = u", noting 
that Z [respectively, Z] must be a proper suffix of a relation word since is a 
maximal piece suffix of XYZ [XYZ] and no relation word can be a piece. 
It now follows easily by induction that the claim also holds when 

(n , n , g ) ^* (u 1 ,v 1 ,q 1 ). 

In particular, taking zq = e and u' = uq and then writing u\ = z\u[ as 
above we have 

|rti | = \zi\ + \u[\ < \zi\ + \u' \ = \z\ | + [no | < |no| + 6 

and similarly \vi\ < \vq\ + b, as required to show that the automaton has 
expansion bound b. □ 

As an immediate corollary we obtain a corresponding statement for semi- 
groups. 

Corollary 1. Let (A \ R) be a finite semigroup presentation satisfying the 
small overlap condition C(4). Then the relation 

{(n, v) G A + x A + | n = v} 

is deterministic rational and reverse deterministic rational. Moreover, one 
can, starting from the presentation, effectively compute 2-tape deterministic 
automata recognising this relation and its reverse. 

Proof. Since the presentation has no empty relation words, the semigroup 
with presentation (A \ R) arises as the subsemigroup of non-identity ele- 
ments in the monoid with presentation (A \ R). It follows that 

{(n, v) £ A + x A + | n = v} = {(n, v) G A* x A* \ u = v} \ {(e, e)}. 
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Now it is easy to verify that a relation R between free monoids is a determin- 
istic rational relation only if R \ {(e, e)} is a deterministic rational relation 
between free semigroups, so the result follows from Theorem [2) □ 

5. Consequences 

In this section we consider a number of interesting consequences and corol- 
laries of Theorem [2j We begin with some terminology from language theory. 

Let A be a finite alphabet, and choose some arbitrary total order < on 
the letters of A. Recall that the corresponding lexicographic order is an 
extension of this order to a total order <l on the free monoid A*, defined 
inductively by e <l w for all w, and for all x, y € A and u,v £ A* we have 
xu <l yv if either x ^ y and x < y, or x = y and u <l v. Lexicographic 
order is a total order but not (unless |A| = 1) a well-order, since it contains 
infinite descending chains such as 

b, ab, aab, aaab, . . . , a l b, . . . 

Hence, if R is an equivalence relation on A* (even a rational one) there is no 
guarantee that every equivalence class of R will contain a lexicographically 
minimal element. In the case that R is locally finite (that is, each equiv- 
alence class is finite), however, every class must clearly contain a unique 
lexicographically minimal element, and the set of elements which are min- 
imal in their class forms a cross-section of the relation, that is, a language 
of unique representatives for the equivalence classes of the relation; we shall 
call these representatives lexicographic normal forms. Remmers showed that 
if {A | R) is a C(3) monoid [semigroup] presentation then the corresponding 
equivalence relation on A* [respectively, A + ] is locally finite (8j[T7]; it fol- 
lows that every element of a C(3) monoid has a lexicographic normal form. 
Johnson [111 [T2] showed that if R is a deterministic rational locally finite 
equivalence relation then the function which maps each word to the cor- 
responding lexicographic normal form can be computed by a deterministic 
transducer. Thus, we obtain the following corollary to Theorem [2j 

Corollary 2. Let {A \ R) be a monoid presentation satisfying C(4) and 
suppose A is equipped with a total order. Then the relation 

{(u,v) £ A* x A* \ u = v and v is a lexicographic normal form} 

is a deterministic rational function. 

The image of a rational function is always a regular language [H Corol- 
lary II. 4. 2]) and deterministic rational functions can be computed in linear 
time Johnson [12, Theorem 5.1] so we have: 

Corollary 3. Let (A \ R) be a monoid presentation satisfying C(4) and 
suppose A is equipped with a total order. Then the lexicographic normal 
forms comprise a regular language of unique representatives for elements of 
the monoid. Moreover, there is an algorithm which, given a word w in A* , 
computes in linear time the corresponding lexicographic normal form. 

A monoid M is called rational |19} [16] if there exists a finite generating 
set A for M and a regular cross-section for M such that the normal 

forms in L are computed by a transducer. 
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Corollary 4. Every monoid admitting a C(4) presentation is rational. 

Recall that the rational subsets of a monoid M are those which can be 
obtained from finite subsets by the operations of union, product and sub- 
monoid generation (the "Kleene star" operation). If M is generated by a 
finite subset A then the rational subsets of M are exactly the images in 
M of regular languages over A, which means they have natural finite rep- 
resentations as finite automata over A. The recognisable subsets of M are 
the homomorphic pre-images in M of subsets of finite monoids. In the case 
that M is a free monoid, the rational subsets are just the regular languages. 
Kleene's Theorem asserts that the rational subsets of a free monoid (that 
is, the regular languages) coincide with the recognisable subsets [TU]. More 
generally, a monoid in which the rational and recognisable subsets coincide 
is called a Kleene monoid, or sometimes is said to satisfy Kleene's Theo- 
rem. Rational monoids were originally introduced in an attempt to obtain a 
concrete characterisation of Kleene monoids [19] , and indeed every rational 
monoid is a Kleene monoid (although it transpires that the converse does 
not hold). Thus, we obtain: 

Corollary 5 (Kleene's Theorem for Small Overlap Monoids). Let M be a 

monoid or semigroup admitting a C(4) presentation, and S a subset of M . 
Then S is rational if and only if S is recognisable. 

Recall that a collection of subsets of some given base set is called a boolean 
algebra if it contains the empty set and is closed under union, intersection 
and complement. As another corollary of the rationality of M we obtain the 
following fact about rational subsets of M . 

Corollary 6. Let M be a monoid admitting a C(4) presentation (A \ R). 
Then the rational subsets of M form a boolean algebra. Moreover, if rational 
subsets of M are represented by automata over A, then the operations of 
union, intersection and complement are effectively computable. 

Proof. Let a : A* — > M be the canonical morphism mapping A* onto M, 
and let 

p = {(u,v) € A* x A* | u = v and v is a lexicographic normal form}. 

Suppose X,Y £ A* are rational subsets, with say X = Xa and Y = Ya 
where X,Y C A* are regular languages. Then using the facts that A* p 
contains a unique representative for every element and that pa = a, it is 
readily verified that M\X = (A*p\Xp)a, XnY = {Xpf\Yp)a and XUY = 
{X pUY p)a . The result now follows from the fact that regular languages in a 
free monoid form a boolean algebra with effectively computable operations. 

□ 

Recall that the rational subset membership problem for a finitely gener- 
ated monoid M is the problem of deciding, given a rational subset of M 
(represented by a finite automaton over some fixed generating set for M) 
and an element of M (represented as a word over the same generating set), 
whether the given element belongs to the given subset. The decidability of 
this problem is independent of the chosen generating set [14} Corollary 3.4]. 
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Corollary 7. Any monoid admitting a C(4) presentation has decidable ra- 
tional subset membership problem (and hence decidable submonoid member- 
ship problem). 

Proof. Suppose M has C(4) presentation (A \ R), and let a : A* — > M 
be once again the canonical morphism. Suppose we are given a finite au- 
tomaton recognising a language X C A* (representing the rational subset 
Xa C M) and a 10 € ^4* (representing the element wcr € M). Certainly 
we can compute from the latter a finite automaton recognising the singleton 
language {w}. Hence, by Corollary [6] we can compute a finite automaton 
recognising a language 7 C 4* such that Y a = Xa n {w}a. But wa G Xcr 
if and only if Xa n {w}a is non-empty, so this reduces the problem to de- 
ciding emptiness of the regular language Y; the latter is well known to be 
decidable. □ 

A monoid M is called asynchronous automatic (see, for example, [9] ) if 
there exists a finite generating set A and a regular language L C A* such 
that L contains a representative for every element of M, and the relation 

{(u,v) £ A* x A* \ua = v} 

is a rational transduction for each a € A and for a = e. It has been shown 
[9j Theorem 6.2] that rational monoids are asynchronous automatic, so we 
also obtain the following. 

Corollary 8. Every monoid admitting a C(4) presentation is asynchronous 
automatic. 

We have already remarked that small overlap conditions are the natu- 
ral semigroup-theoretic analogue of the small cancellation conditions exten- 
sively used in combinatorial group theory (see, for example, [IS])- It is well 
known that a group admitting a finite presentation satisfying sufficiently 
strong small cancellation conditions is word hyperbolic in the sense of Gro- 
mov [7\. The usual geometric definition of a word hyperbolic group has 
no obvious counterpart for more general monoids or semigroups; however, 
Gilman [6] has given a language-theoretic characterisation of word hyper- 
bolic groups. Specifically, he showed that a group is word hyperbolic if and 
only if it admits a finite generating set A and a regular language L C A* con- 
taining a representative for every element of M such that the multiplication 
table 

{u#v#w R I uv = w} 

is a context-free language, where # is a new symbol not in A. Motivated 
by this result, Duncan and Gilman [3] have suggested calling a monoid 
word hyperbolic if it satisfies this language-theoretic condition. Since every 
rational monoid is word hyperbolic [91 Theorem 6.3] we can deduce that 
every C(4) monoid is word hyperbolic in this sense. 

Corollary 9. Every monoid admitting a C(4) presentation is word hyper- 
bolic in the sense of Duncan and Gilman ( and furthermore admits a hyper- 
bolic structure with unique representatives) . 
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